Overview

Dataset statistics

Number of variables6
Number of observations125497040
Missing cells21657651
Missing cells (%)2.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.7 GiB
Average record size in memory32.0 B

Variable types

Numeric4
DateTime1
Boolean1

Alerts

onpromotion is highly imbalanced (61.5%)Imbalance
onpromotion has 21657651 (17.3%) missing valuesMissing
unit_sales is highly skewed (γ1 = 582.2246437)Skewed
id is uniformly distributedUniform
id has unique valuesUnique

Reproduction

Analysis started2026-01-06 17:18:27.148143
Analysis finished2026-01-06 17:26:08.059061
Duration7 minutes and 40.91 seconds
Software versionydata-profiling vv4.18.0
Download configurationconfig.json

Variables

id
Real number (ℝ)

Uniform  Unique 

Distinct125497040
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean62748520
Minimum0
Maximum1.2549704 × 108
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size957.5 MiB
2026-01-06T18:26:08.912391image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile6274852
Q131374260
median62748520
Q394122779
95-th percentile1.1922219 × 108
Maximum1.2549704 × 108
Range1.2549704 × 108
Interquartile range (IQR)62748520

Descriptive statistics

Standard deviation36227875
Coefficient of variation (CV)0.57735028
Kurtosis-1.2
Mean62748520
Median Absolute Deviation (MAD)31374260
Skewness-9.4342576 × 10-17
Sum7.8747535 × 1015
Variance1.3124589 × 1015
MonotonicityStrictly increasing
2026-01-06T18:26:08.958484image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
11
 
< 0.1%
21
 
< 0.1%
31
 
< 0.1%
41
 
< 0.1%
51
 
< 0.1%
61
 
< 0.1%
71
 
< 0.1%
81
 
< 0.1%
91
 
< 0.1%
Other values (125497030)125497030
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
1254970391
< 0.1%
1254970381
< 0.1%
1254970371
< 0.1%
1254970361
< 0.1%
1254970351
< 0.1%
1254970341
< 0.1%
1254970331
< 0.1%
1254970321
< 0.1%
1254970311
< 0.1%
1254970301
< 0.1%

date
Date

Distinct1684
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size957.5 MiB
Minimum2013-01-01 00:00:00
Maximum2017-08-15 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2026-01-06T18:26:08.999442image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:26:09.051061image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

store_nbr
Real number (ℝ)

Distinct54
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.464578
Minimum1
Maximum54
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size239.4 MiB
2026-01-06T18:26:09.101191image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q112
median28
Q343
95-th percentile51
Maximum54
Range53
Interquartile range (IQR)31

Descriptive statistics

Standard deviation16.33051
Coefficient of variation (CV)0.59460263
Kurtosis-1.3568904
Mean27.464578
Median Absolute Deviation (MAD)16
Skewness-0.074194851
Sum3.4467232 × 109
Variance266.68557
MonotonicityNot monotonic
2026-01-06T18:26:09.138255image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
443513089
 
2.8%
453484244
 
2.8%
473457407
 
2.8%
33401264
 
2.7%
463353890
 
2.7%
493342531
 
2.7%
83261184
 
2.6%
483236523
 
2.6%
503192566
 
2.5%
63089799
 
2.5%
Other values (44)92164543
73.4%
ValueCountFrequency (%)
12562153
2.0%
22987840
2.4%
33401264
2.7%
42830554
2.3%
52666691
2.1%
63089799
2.5%
72921204
2.3%
83261184
2.6%
92773790
2.2%
101740482
1.4%
ValueCountFrequency (%)
541648867
1.3%
531938255
1.5%
52290581
 
0.2%
512960031
2.4%
503192566
2.5%
493342531
2.7%
483236523
2.6%
473457407
2.8%
463353890
2.7%
453484244
2.8%

item_nbr
Real number (ℝ)

Distinct4036
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean972769.15
Minimum96995
Maximum2127114
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size478.7 MiB
2026-01-06T18:26:09.176804image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum96995
5-th percentile177395
Q1522383
median959500
Q31354380
95-th percentile1964356
Maximum2127114
Range2030119
Interquartile range (IQR)831997

Descriptive statistics

Standard deviation520533.6
Coefficient of variation (CV)0.53510496
Kurtosis-0.78499653
Mean972769.15
Median Absolute Deviation (MAD)404376
Skewness0.21928968
Sum1.2207965 × 1014
Variance2.7095523 × 1011
MonotonicityNot monotonic
2026-01-06T18:26:09.458231image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50233183475
 
0.1%
31438483450
 
0.1%
36460683308
 
0.1%
26555983047
 
0.1%
55987082513
 
0.1%
103668982134
 
0.1%
27352882108
 
0.1%
56453382086
 
0.1%
26105281774
 
0.1%
41435381755
 
0.1%
Other values (4026)124671390
99.3%
ValueCountFrequency (%)
969955229
 
< 0.1%
991974902
 
< 0.1%
10350135841
< 0.1%
10352053175
< 0.1%
10366550449
< 0.1%
10557440322
< 0.1%
10557541311
< 0.1%
10557639959
< 0.1%
10557730113
< 0.1%
10569351730
< 0.1%
ValueCountFrequency (%)
2127114247
 
< 0.1%
21269445
 
< 0.1%
212684212
 
< 0.1%
2124052704
< 0.1%
212386312
 
< 0.1%
212385910
 
< 0.1%
212383913
 
< 0.1%
212379121
 
< 0.1%
21237908
 
< 0.1%
212377564
 
< 0.1%

unit_sales
Real number (ℝ)

Skewed 

Distinct258474
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.5548653
Minimum-15372
Maximum89440
Zeros0
Zeros (%)0.0%
Negative7795
Negative (%)< 0.1%
Memory size957.5 MiB
2026-01-06T18:26:09.500793image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum-15372
5-th percentile1
Q12
median4
Q39
95-th percentile29
Maximum89440
Range104812
Interquartile range (IQR)7

Descriptive statistics

Standard deviation23.605152
Coefficient of variation (CV)2.7592663
Kurtosis1796939.4
Mean8.5548653
Median Absolute Deviation (MAD)3
Skewness582.22464
Sum1.0736103 × 109
Variance557.20319
MonotonicityNot monotonic
2026-01-06T18:26:09.539391image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
123444825
18.7%
217749070
14.1%
313263841
10.6%
410216998
 
8.1%
57958957
 
6.3%
66423645
 
5.1%
75078334
 
4.0%
84163234
 
3.3%
93403350
 
2.7%
102879594
 
2.3%
Other values (258464)30915192
24.6%
ValueCountFrequency (%)
-153721
< 0.1%
-100021
< 0.1%
-46731
< 0.1%
-36061
< 0.1%
-36001
< 0.1%
-3451.3631
< 0.1%
-24871
< 0.1%
-24002
< 0.1%
-19431
< 0.1%
-18061
< 0.1%
ValueCountFrequency (%)
894401
< 0.1%
441421
< 0.1%
300001
< 0.1%
207481
< 0.1%
200001
< 0.1%
171461
< 0.1%
160001
< 0.1%
153751
< 0.1%
150001
< 0.1%
144831
< 0.1%

onpromotion
Boolean

Imbalance  Missing 

Distinct2
Distinct (%)< 0.1%
Missing21657651
Missing (%)17.3%
Memory size239.4 MiB
False
96028767 
True
 
7810622
(Missing)
21657651 
ValueCountFrequency (%)
False96028767
76.5%
True7810622
 
6.2%
(Missing)21657651
 
17.3%
2026-01-06T18:26:09.567125image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Interactions

2026-01-06T18:24:06.249510image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:22:19.857697image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:22:58.360936image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:23:30.980191image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:24:15.196811image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:22:29.367778image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:23:05.803832image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:23:40.162416image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:24:23.399346image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:22:39.693392image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:23:13.739666image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:23:47.900830image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:24:31.017240image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:22:49.907715image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:23:21.859107image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2026-01-06T18:23:56.364831image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

2026-01-06T18:26:09.584672image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
iditem_nbronpromotionstore_nbrunit_sales
id1.0000.3020.1520.023-0.050
item_nbr0.3021.0000.0730.014-0.004
onpromotion0.1520.0731.0000.0240.001
store_nbr0.0230.0140.0241.0000.079
unit_sales-0.050-0.0040.0010.0791.000

Missing values

2026-01-06T18:24:32.013935image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2026-01-06T18:24:48.993507image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

iddatestore_nbritem_nbrunit_salesonpromotion
002013-01-01251036657.0<NA>
112013-01-01251055741.0<NA>
222013-01-01251055752.0<NA>
332013-01-01251080791.0<NA>
442013-01-01251087011.0<NA>
552013-01-01251087863.0<NA>
662013-01-01251087971.0<NA>
772013-01-01251089521.0<NA>
882013-01-012511139713.0<NA>
992013-01-01251147903.0<NA>
iddatestore_nbritem_nbrunit_salesonpromotion
1254970301254970302017-08-155420868821.0False
1254970311254970312017-08-155420874093.0False
1254970321254970322017-08-155420879788.0False
1254970331254970332017-08-155420889227.0False
1254970341254970342017-08-155420890364.0False
1254970351254970352017-08-155420893394.0False
1254970361254970362017-08-155421064641.0True
1254970371254970372017-08-15542110456192.0False
1254970381254970382017-08-15542113914198.0True
1254970391254970392017-08-155421164162.0False